Doris: A Tool for Interactive Exploration of Historic Corpora

نویسنده

  • Sreya Guha
چکیده

Insights into social phenomenon can be gleaned from trends and patterns in corpora of documents associated with that phenomenon. Recent years have witnessed the use of computational techniques, mostly based on keywords, to analyze large corpora for these purposes. In this paper, we extend these techniques to incorporate semantic features. We introduce Doris, an interactive exploration tool that combines semantic features with information retrieval techniques to enable exploration of document corpora corresponding to the social phenomenon. We discuss the semantic techniques and describe an implementation on a corpus of United States (US) presidential speeches. We illustrate, with examples, how the ability to combine syntactic and semantic features in a visualization helps researchers gain insights into the underlying phenomenon.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Geomatics and Architectural Heritage: a Multi-layer Interactive Map of Tuscia-Italy

The main aims of this research are the design and implementation of a multilayered and interactive geomatic map of the cultural heritage of Tuscia, one of the richest and most complex cultural areas of Italy, thanks to the presence of different civilizations, from Etruscans and Romans to the Middle Age. Its cultural heritage is very rich, valuable and above all diversified because including tan...

متن کامل

SCHNAPPER: A Web Toolkit for Exploratory Relation Extraction

We present SCHNÄPPER, a web toolkit for Exploratory Relation Extraction (ERE). The tool allows users to identify relations of interest in a very large text corpus in an exploratory and highly interactive fashion. With this tool, we demonstrate the easeof-use and intuitive nature of ERE, as well as its applicability to large corpora. We show how users can formulate exploratory, natural language-...

متن کامل

Arabic News Articles Classification Using Vectorized-Cosine Based on Seed Documents

Besides for its own merits, text classification (TC) has become a cornerstone in many applications. Work presented here is part of and a pre-requisite for a project we have overtaken to create a corpus for the Arabic text process. It is an attempt to create modules automatically that would help speed up the process of classification for any text categorization task. It also serves as a tool for...

متن کامل

A Comparative Analysis of Metadiscourse Markers in the Result and Discussion Sections of Literature and Engineering Research Papers

This study compares metadiscourse markers in result and discussion sections of literature and engineering research papers. To this end, 40 research articles (20 literature and 20 engineering) are selected from two major international journals. Based on Hyland’s (2005) model of metadiscourse, the articles are codified in terms of frequency, percentage, and density of interactive and interactiona...

متن کامل

Semantic Pathways: A novel visualisation of varieties of English

Semantic Pathways is a corpus exploration tool with a unique visual interface in which keyword extraction and keyword-based document clustering have been implemented in order to facilitate insight forming. Semantic Pathways combines corpus comparison techniques from Corpus Linguistics with aestheticallydriven design and interaction, to produce fluidly interactive information exploration. In add...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2017